Introduce key/value map bench#1121
Introduce key/value map bench#1121jtescher merged 1 commit intoopen-telemetry:mainfrom shaun-cox:ehm
Conversation
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## main #1121 +/- ##
=======================================
- Coverage 50.5% 49.8% -0.7%
=======================================
Files 168 171 +3
Lines 19893 20171 +278
=======================================
+ Hits 10060 10061 +1
- Misses 9833 10110 +277 ☔ View full report in Codecov by Sentry. |
|
Awesome work! It makes a lot of sense to me that, even if duplicate detection is necessary, the data sizes are such that a simple sequential span might still be the optimal way to store data. |
I really like idea of removing duplication detection logic (i.e switch from HashMaps to OneVec/TwoVec ideas presented) from the API/SDK. If there is strong need to dedup, then it can be added as an optional span/log processor OR in the OTLPExporter itself, so the main user thread won't pay any cost. The spec actually allows flexibility in terms on where the de-dup should occur. https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/common#attribute-collections (This topic was discussed in the SIG call on 06/20 as well, but after re-reading the spec I think it totally makes sense for this SIG to remove the costly de-dup.) |
Two new benchmarks:
key_value_mapfor evaluating different implementation choicesspan_builderto focus on performance analysis ofSpanBuilderOverview
There are four implementations considered:
EvictedHashMap: currently used bySpanDatato carry span attributesIndexMap: currently used bySpanBuilderto carry span attributes (in anOrderMap)OneVec:Vec<(Key, Value>)where no hashing or duplicate key detection takes placeTwoVec: aVec<Key>and aVec<Value>, linked by shared indices, where no hashing or duplicate key detection takes placeThere are two main operations to consider for each implementation:
lookup: find two attributes in "the map"populate: populaten(2, 8, or 32) attributes in the maplookupapproximates what a sampling decision might do in consulting specific keys present in theSpanBuilder. Its performance looks like the following: (Note, theVecbased implementations are pessimized for worst-case, finding the last two attributes in the map.)OneVecbeatsIndexMapfor 2 and 8 attributes in the map, but loses for 32 attributes.populateapproximates what anyone who wants to create aSpanmust pay to get attributes into theSpanBuilder. Its performance looks like the following:As expected, populating hash maps is a lot more expensive than populating vectors, but that's the cost of detecting duplicates.
Learnings
Knowing that we have to both
populateandlookupwhen creating aSpan, it's useful to see both together:Now we see that
OneVecbeatsIndexMapfor all cases, and is more than twice as fast asIndexMapfor 32 attributes.So this PR is meant to do three things:
SpanBuilderthat ends up producing aSpanthat is non-recording pay for indexing that will never get used.